Towards Unsupervised and Language-independent Compound Splitting using Inflectional Morphological Transformations

نویسندگان

  • Patrick Ziering
  • Lonneke van der Plas
چکیده

In this paper, we address the task of languageindependent, knowledge-lean and unsupervised compound splitting, which is an essential component for many natural language processing tasks such as machine translation. Previous methods on statistical compound splitting either include language-specific knowledge (e.g., linking elements) or rely on parallel data, which results in limited applicability. We aim to overcome these limitations by learning compounding morphology from inflectional information derived from lemmatized monolingual corpora. In experiments for Germanic languages, we show that our approach significantly outperforms language-dependent stateof-the-art methods in finding the correct split point and that word inflection is a good approximation for compounding morphology.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Language-independent compound splitting with morphological operations

Translating compounds is an important problem in machine translation. Since many compounds have not been observed during training, they pose a challenge for translation systems. Previous decompounding methods have often been restricted to a small set of languages as they cannot deal with more complex compound forming processes. We present a novel and unsupervised method to learn the compound pa...

متن کامل

A Language-independent Approach to Extracting Derivational Relations from an Inflectional Lexicon

In this paper, we describe and evaluate an unsupervised method for acquiring pairs of lexical entries belonging to the same morphological family, i.e., derivationally related words, starting from a purely inflectional lexicon. Our approach relies on transformation rules that relate lexical entries with the one another, and which are automatically extracted from the inflected lexicon based on su...

متن کامل

Unsupervised morphological parsing of Bengali

Unsupervised morphological analysis is the task of segmenting words into prefixes, suffixes and stems without prior knowledge of language-specific morphotactics and morpho-phonological rules. This paper introduces a simple, yet highly effective algorithm for unsupervised morphological learning for Bengali, an Indo-Aryan language that is highly inflectional in nature. When evaluated on a set of ...

متن کامل

To stem or lemmatize a highly inflectional language in a probabilistic IR environment?

Effects of three different morphological methods-lemmatization, stemming and inflectional stem generation-for Finnish are compared in a probabilistic IR environment (INQUERY). Evaluation is done using a four point relevance scale which is partitioned differently in different test settings. Results show that inflectional stem generation which has not been used much in IR, compares well with lemm...

متن کامل

Morphological Analysis of Inflectional Compound Words in Bangla

The addition of inflectional suffixes in Bangla compound words is fairly complex. Normally, when two root words are joined, the corresponding inflectional suffix of each root word is deleted from the final compound word. In Bangla however, the compound word’s individual root words may retain their inflectional suffixes even in the final compound word. This non-deletion of inflection creates an ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016